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Abstract 

A definition of essential independence is proposed for sequences of polytomous items. 
For items satisfying the reasonable assumption that the expected amount of credit awarded 
increases with examinee ability, we develop a theory of essential unidimensionality which 
closely parallels that of Stout. Essentially unidimensional item c^equences can be shown 
to have a unique (up to change-of-scaJe) dominant underlying trait, which can be consis- 
tently estimated by a monotone transformation of the sum of the item scores. In more 
general polytomous-response latent trait models (with or without ordered responses), an 
M-estimator based upon maximum likelihood may be shown to be consistent for 9 under 
essentially unidimensional violations of local indep>endence and a variety of monotonic- 
ity/identifiability conditions. A rigorous proof of this fact is given, and the standard error 
of the estimator is explored. These results suggest that ability estimation methods that 
rely on the summation form of the log-likelihood under local independence should generally 
be robust under essential independence, but standard errors may vary greatly from what 
is usually expected, depending on the degree of departure from local independence, Ar 
index of c -^parture from local independence is also proposed. 

KEY WORDS: item response theory (IRT), polytomous item responses, essential inde- 
pendence, unidimensionality, latent trait identifiability, likelihood-based trait estimation, 
asymptotic stamdard errors, structural robustness, local dependence. 
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1. Introduction 

In the usual binary or dichotomous response formulation of item response theory 
(IRT), the correctness of the j^^ item in a test or item sequence is indicated by a (random) 
response variable Xj taking on the value 1 for correct responses and the value 0 for incor- 
rect responses. This codes the examinee's response with the score we wish to assign to that 
response. In considering polytomous data^ it is convenient to treat the coding and scoring 
operations separately. For the polytomous item we will code n possible response cate- 
gories with the arbitrary labels Xjo.Xj^,. . . ,x^(n-i), and indicate the examinee's response 
with the (random) response variable 

For convenience in scoring the item, it is also useful to have a set of binary response 
variables 

V ^ / ^ ^ 
" I 0 else. 

Note that for each j.Yjo ^ Yj^ . . . y,(^.j) = 1, and that any item scoring method Aj 
that assigns the numerical score a^^ to the category Xj^ may be expressed in terms of the 
Y's as 

^ ^jm Yjm • 

Finally, let Xj = {Xi.X^,., . ^Xj) be the vector of item responses on a test of length J 
given by a randomly-chosen examinee, and let = (xj , , . . .xj) denote any particular 
instance of Xj. 

The general form of an IRT model for X j may then be expressed as 

p\Xj = xj] = J P[Xj^xj\Q = e] f{e)de. (i) 



We follow Thissen and Steinberg (1986) in considering 6 = (©i , . - • ,0^), the latent trait 
Of trait vector, to be a random variable (vector); thus, f{$) is this variable's probability 
density function for the population in question. The traditional IRT assumption of local 
independence reads, for polytomous item response models, 

^i^-^=£-^i§=!^i = n n (LI) 

; = 1 m=0 

where the yj^ are observed values of Yjm corresponding to each x^, and Pj^{6) = P[Xj = 
Xj^\0 = 6j are the response characteristic functions or, when d = I, response characteristic 
curves (RCC's). There is no natural monotonicity assumption for general polytomous 
models, although for those cases in which the responses are ordered from least correct to 
most correct as m increases, it seems reasonable to require that 

P*^{6) = ^ Pjk{^) is nondecreasing in 6 for all j, m, (M) 

m 

that is, nondecreasing in each coordinate of 6 with the other coordinates held fixed (these 
cumulative response functions are considered by, for example, Samejima, 1972). Note 
that PJ^{6) = P[response m or greater |^] is the binary item response function one would 
obtain by dichotomizing the item so that response m or greater is scored as 1 (correct) and 
any lower response is scored as 0 (incorrect). When LI and M both bold for a tf-dimensional 
trait 0, we will write di for d. We will be concerned mostly with = 1 models in what 
follows. 

This paper has two aims. First, we wish to present and explore a definition of essential 
independence (EI) for polytomous item response sequences. EI, proposed for binary item 
sequences by Stout (1987; 1990), is a weakening of LI that is useful when — as seems often 
to be the case in real-life tests — there is; a dominant underlying latent trait for the items but 



the presence of various minor traits prevents LI from holding exactly. For items satisfying 
a condition like M above, the theory of essential unidimensionality and estimation of the 
dominant unidimeusional latent trait based on raw test score proceeds much as in Stout 
(1990). This is the subject of Sections 2 and 3. 

Our second aim is to explore maximum likelihood estimates calculated under the 
assumption that LI holds when in fact only EI holds. Section 4 contains the basic result: 
the MLE calculated under LI remains consistent for B under EI, subject only to regularity 
conditions and a natural identifiability condition. Thus, maximum likelihood estimation 
is robust against this realistic violation of local indei>endence. 

Monotone unidimeusional local independence models will, and should, continue to be 
used as basic psychometric tools since they are attractive to the intuition and lead to 
explicit, analytically straightforward likelihoods. However, it is widely accepted that they 
oversimplify the latent structure of most tests in the real world. In some situations, the 
way the latent structure violates this simple model may be estimated and exploited, but 
in many situations it may be impossible or overly expensive to collect the data needed to 
ferret out a multidimensional latent structure. The discussion of this issue by Drasgow 
and Parsons (1983) is especially relevant here. Essential independence is a way of char- 
acterizing unidimeusional stability without knowing the true likelihood function (latent 
structure). The importance of the robustness result of Section 4 is that it suggests that 
ability estimation methods based on the simple LI model continue to work in situations 
in which the Utent factors causing strict LI to be violated are sufficiently minor that EI 
holds. 

Despite this robustness in consistency, there is little robustness in variability. In 



Section 6 we consider the standard error of the estimator of Section 4, showing that if 
the departure from local independence is great enough, the estimator can fail to have the 
usual standard error based on the information function, can fail to converge at the usual 
J^^l^ rate, and can even fail to be asymptotically normally distributed. An index of the 
degree of departure from LI is proposed in Section 5 that can be used to calculate the 
new standard error. Ll-based estimators like the MLE can be expected to be close to 
the examinee's Q under realistic conditions if the test is long, but conventional methods 
of assessing the standard errors of the estimates may be misleadingly optimistic in these 
same realistic settings. 

Gibbons, Bock, and Hedeker (1989) have developed a method of factor analyzing 
dichotomous data with correlated specific factor ^ that may be useful to obtain correct 
standard error estimates in at least some IRT settings. An indication of how their method 
might be used in the present context will be given in Section 5» Wainer and Wright (1980) 
have also reported some success using jackknife standard error estimates to account for 
extra variation in a cf/^ = 1 Rasch model due to guessing and **sleeping'' behavior. 

Also important in assessing the standard errors of ability estimators is the uncertainty 
involved in estimating RCC's. Tsutakawa and Soltys (1988) have incorporated RCC un- 
certainty into posterior mean estimator standard errors under LI in the dichotomous case. 
Adapting such methods to the EI setting wjH be of great importance in eventually under- 
standing the true error structure of estimated IRT models, but that is beyond our present 
scope. 

Although the results of this paper are stated and proved in the polycomous case, it 
is expected that they will find greatest application in the dichotomous setting, where IRT 
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techniques have been most fully developed. For the reader^s convenience, the main points 
of Sections 4 and 5 are restated for dichotomous responses in Sections 6 — these results 
are also new in the dichotomous case. Finally, Section 7 summarizes the conclusions of 
our work, and indicates extensions to other popular Ll-based trait estimators, such as the 
posterior mode and posterior mean. 

2. Essential Independence and Item Sequences 

The notions of essential independence and essential unidimensionality were introduced 
in Stout (1987) and explored in the dichotomous case by Stout (1990) and Junker (1988). In 
the factor analytic tradition, but with a decidedly non-factor- analytic perspective, Stout 
seeks a criterion by which only dominant dimensions can be counted. When only one 
dominant dimension is counted, the test is said to be essentially unidimensional. 

The fundamental idea behind essential independence is that a trait vector 0 is dom- 
inant if, after conditioning on 0, the residual covariances among the items are small on 
average. This parallels the idea, in traditional IRT, that if the latent space is "complete", 
then the residual covariances are all zero. A partial answer to the question of how small 
the residual covariances must be for 0 to dominate has been provided by Stout's (1987) 
statistical procedure for ai^sessing essential unidimensionality in a fixed, finite set of di- 
chotomous items. If the residual covariances are small but not zero, 0 continues to have 
many properties of LI latent trait vectors: it is strongly related to the total test score, it 
is better and better identified as the test length grows, etc. 

To examine properties of 0 and of 6 estimators as test length grows, it is necessary to 
embed the finite test JVj,. . . ,Xj in an infinite collection of items X. For example, results 
of Levine (1989) make it clear that not even the distribution of 0 is completely identifiable 
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from a finite-length test, let aJone particular examinees^ £ vectors. Such an embedding is 
implicit even in traditional discussions of IRT trait estimation (e.g., Birnbaum, 1968, pp 
455-457; Lord, 1980, p. 59). 

The substantive interpretation of this embedding varies from application to appiica- 
tion. In some settings it may be reasonable to imagine that the process used to generate 
the test ATi,... ,Xj — which may, for example, involve many item writers and reviewers 
generating items of the same character and in the same way — is simply continued to pro- 
duce more and more items. Or it may be reasonable to think of , . . . ,.Yj as forming a 
(stratified) sample from a large item pool, as when test forms are constructed by hand ac- 
cording to a test specification matrix, or constructed **on the fly*' in computerized adaptive 
testing (CAT). Other interpretations may also be appropriate. 

All such interpretations may be encompassed in the following framework. In practice, 
a test form of length J -f 1 is seldom obtained by simply finding a form of length J and 
tacking one more item onto the end of it. Instead, forms of differing lengths — intended to 
measure the same construct — will be constructed at different times according to slightly 
different design specifications. Thus, in attempting to understand what is meant by letting 
the lest length J grow, we may consider a sequence of tests 

X2 = (A 2j , A'23 ), 

A'3 = (-V31 , A'sj , Xj3 ), 

Xj = (A'ji , Xj2 , ATjs , • - ■ , Xjj), 
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in which the test of length J need not be a subtest of the test of length J + 1, for any J. 
The only requirement here is that e .ch test be designed to measure the same construct. 
LI and other properties of the traditional IRT model extend in a natural way to such a 
sequence of tests by requiring that they hold in every test X j \u the sequence. We will 
abstract the idea that the tests ""measure the same construct^ by assuming that @ is the 
same from test to test, and that when an item appears in more than one X j, it has the 
same response curves each time it appears. 

This framework allows us to make mathematically rigorous statements about the 
identifiability, uniqueness, and estimation of dominant latent traits as test length grows. It 
is justifiable insofar as it helps crystalize ideas about finite-length tests with both dominant 
and minor dimensions, or it suggests ways to improve the analysis of real tests. The sense 
in which 0 is the dominant influence, essential independence, will be carefully defined in 
the next section. For now we remark that it is not necessary to arrange the items within 
Xj in any particular order to acheive this. Rather, essential independence requires that 
the relative influence of minor factors not included in 0 be weaker — through cancellation 
between items, moderation within items, etc. — in longer tests than in shorter ones. 

Formally, this framework leads to a rather messy notation, since it adds a '*test in- 
dex'' J to all quantities under discussion; a^>, becomes ajj^^ Aj becomes Ajj, etc. For 
simplicity's sake, we will retain the notation of Section 1 in %vhat follows, and speak infor- 
mally of embedding the fixed test -Yj as the first J items in a single infinite item sequence 
X — {X\,X2,X^ , , . The reader should bear in mind that the results below also apply 
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to the more general framework described above. 



3. Essential Independence for Polytomous Items 



The traditional approach in IRT is to say that a latent trait (vector) 0 completely 
controls the interesting variation in the item responses if LI and M hold. In contrast, 
we would like to be able to determine whether the latent vector & is the dominant in- 
fluence underlying the item responses. Moreover, © should dominate regardless of how 
the responses are scored. Thus, it is appropriate to consider an arbitrary scoring scheme 
{ajm} and corresfKDnding item scores Aj subject only to the constraint that there is some 
M < oo such that |a^>n| < ^ for all j,m. All of the scoring schemes considered below 
will be bounded in this manner. V 0 is to be the dominant latent trait vector, we should 
at least require that the variation of the raw score, Aj^^ -C/=i -^i' small when we 
condition on 0, as J oo. 

Definition S.l, The sequence of polytomous items A' is essentially independent (EI) with 
respect to the latent trait(s) 0 if and only if, for every bounded scoring scheme {ajm} and 
every 0. 



This definition of EI for polytomous items, which is equivalent to requiring that 
lim J— Var(j4 j \ O = 0) ^ 0 for every bounded scoring scheme, directly generalizes Stout's 
definition of strong EI for binary items (Definition 3.5, Stout, 1990). Stout's various defini- 
tions of essential independence are likely not equivalent in general, but they are equivalent 
when the residual covariances are nonnegative (as seems plausible in many educational 




(EI) 
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testing contexts; see the discussion following Theorem 5.1 below). Only the strong EI 
definition generalizes naturally to the polytomous case, and for this reason it is preferred 
in this paper* 

Clearly, every LI item sequence is EL Since the covariances above are unaffected by 
shifting the coefficients from ajm to a* jm — o^jm + c^i for any constants c^, we see that 
Definition 3,1 is equivalent to ones in which only positive, bounded ay^ are allowed; or 
only bounded a^^ for which at least one response from each item has a^^ = 0 are allowed; 
etc. Now, consider the expected item scores, 



>»3 

Theorem S l^ The following are equivalent, for a sequence of polytomous items yi : 
(a) X is EI with respect to 0; 



m=0 



and the exi>ected raw test score, or test characteristic function. 




(b) For each bounded scoring scheme {g;^} and each 6^ 



lim EliAj - Aji0)y \ G^e] = 0: 



(c) For each bounded scoring scheme {aj^} and each 6, 




J n-l 



in probability, given G = ^, as J — ' oo. 
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Proof: The proof is an easy extension of the proof of Theorem 3.2 of Stout (1990). D 

Estimating Aj{6) is not necessarily useful unless 6 is unidimeusional. Just as with bi- 
nary items, a particular \'alue Aj{6) may be possible for examinees with radically different 

^'s due to compensation among the components of 9. Hereafter, we will restrict ourselves 

<^ 

to ur .dimensional traits 0 and consider estimating each examinee's 6. 

When 0 is unidimensional, some sort of monotonicity condition becomes useful, so 
that we can estimate 9 with 9j = Aj'(Aj), where AJ^(') is the inverse of Aj{d), (In the 
usual binary setting A'j^iAj) = FJ^{Xj)^ for example.) In models that award partial 
credit for partially-correct answers, it seems natural to require that the expected amount 
of partial credit awarded on each item increases with the level of the latent trait: 

Aj{9) is nondecreasing in 9 for each j- (M ') 

What is the relationship between condition M in Section 1 and M ' above? We will 
call a sequence of items A" for which the item response categories {x^m} are indexed so 
that M holds an ordered-respcnse item sequence. On the other hand, if a scoring scheme 
{ajfn) satisfies, for each j,0 < a^o ^ < « - « < a^(n-))i will call it a ordered scoring 
scheme. Then, with the convention that = 0, 

It foDows that condition M is equivalent to M ' holdir.g for every ordered-response scoring 
scheme. M is a condition that has been considered for many parametric ordered- response 
models. For example, Samejima (1972) has shown that M does hold for her graded- response 
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modeU ^ as for Bock's (1972) irominal model constrained to apply to ordered-response 
items (also, see Thissen and Steinberg, 1986). A somewhat milder form of monotonicity 
called LAD .a sniRcient to build the estimator $js 

Definition S.2. The ordered scoring scheme {<i>fn} is asymptotically discriminating (AD) 
if and only if there exists an c > 0 such that 

J Z^(«>(n-j) - fl>o) > e, V J. (AD) 

The item sequence X is locally asymptotically discriminating (LAD) if and only if, for each 
AD ordered scoring scheme {a^m}i to every 0 there corresponds an interval Ng containing 
6 and an e« > 0 such that 

> e^, V t e AT^, f ^ ^, V J. (LAD) 

t ^ ff 

This generalizes LAD for binary item sequences as presented in Definition 3.8 of Stout 
(1990). Note that LAD imposes a minimum discrimination condition on the test charac- 
teristic curves at each ^, as J oo. Also, the items themselves need not have ordered 
responses; only the scoring schemes {ajm} need be ordered. LAD may be viewed as nat- 
urally extending the interpretation of M — that the expected amount of credit awarded 
increases with the examinee's ability — from a fixed-length test to an item sequence, with- 
out strictly requiring M to hold for every item in the sequence. 

Theorem S.2. If the polytomous item sequence X satisfies EI and LAD with respect to the 
unidimensional trait 0, then for each 9 and each e > 0, if {a^m} is a bounded AD ordered 
scoring scheme, then 

lim P[\Ay{Aj)-d\>t\Qi^e]^Q. 

12 
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Proof: Virtually the same as the proof of Theorem 3.6 of Stout (1990). □ 
Theorem S.S* If the polytomous item sequence X satisfies EI and LAD with respect to the 
unidimensional trait 0, and satisfies £1 with respect to another latent trait r, then there 
exists a nondecreasing function g{t) such that 

P\e^g{T)] = I. 

Proof: Follows Theorem 3.3 of Stout (1990) or Theorem 2 4 of Junker (1988). □ 

Theorems 3.1 through 3.3 show that if EI and LAD hold, we can estimate a unique 
dominant latent trait with any reasonable A^^ {Aj)\ any other dominant trait we might 
find will be change-of-scale of the trait we have estimated with -4J^ {A j). (This is the same 
level of trait uniqueness as exists under the general di ^ \ model, although pairticular 
parametic models — for example, the Rasch model — may possess additional scale proper- 
ties.) Since under EI and LAD we can identify and estimate a unique unidimensional 
dominant trait in the item response data, we will call this situation essentially unidimen- 
sional dE — I' When no single dominant trait exists in this sense, we will write <i£: > 1. 

4. Maximum Likelihood Ability Estimation 
Often it is desired to estimate individuals' 9 values, treated as parameters in the 
conditional model. 

J n-l 

p[xj = £^ 10 - ^] = n n PimidY'-^. 

where Vjm = 1 when Xj — xjm, and 0 otherwise (i.e., y>m are the observed values of 
Yjm)- If the polytomous item sequence X does not satisfy LAD, the estimators described 
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in Section 3 may not exists let alone be consistent for 6. E^cn when LAD holds, it may be 
desirable to have a more-efficient estimator than -AJ^ {Aj), 

One common method of estimating individual examinees' abilities is via mcximum 
likelihood f treating each examinee's ^ as an unknown parameter to be estimated and the 
RCC's as known. When LI holds, the maximum likelihood estimator (MLE) ^ j is known 
to be a consistent estimator for 6 as J ^ oo, and has good asymptotic distribution prop- 
erties (asymptotic normality, efficiency, etc.), assuming that the RCC's are known (e.g., 
Lehmann, 1983). We wish to investigate the behavior of computed under the (false) 
assumption of LI with respect to 0 when the item sequence satisfies EL Technically, Sj is 
called an M-estimator since it is no longer based on the true likelihood, which is unknown 
under EI (e.g., Serfling, 1980, pp. 243 ff.). However, for convenience we will continue to 
call 6j the MLE, since it i& based on maximizing a (wrong) likelihood. 

There are two reasons for working with the MLE. First, it is commonly used for exam- 
inee scoring in applied IRT work, so we are compelled to know its behavior under realistic 
violations of LI. Second, the behavior of the MLE may be taken to be representative of 
the behavior of other likelihood-based methods. Our work with the MLE is intended to 
suggest that similar robustness to departures from LI within an EI framework could be 
expected of other popular estimators and predictors, such as estimators of the posterior 
mode and posterior mean (e.g., Samejima, 1969; Bock & Mislevy, 1982; Lord, 1986). This 
point will be taJten up again in the discussion in Section 7. 

Let us now turn to the requirements for consistency of ^j, the convergence of dj to 6 
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as J grows. Assuming (incorrectly) that LI holds with respect to 0, the log-likelihood for 
estimating one examinee's 6 based on his or her item responses Xj — or equivalently, the 
response- category indicators Yj^y^ — is 

im = log n n Pjm{e)^^- = Z E A,m(5)y.^, 

where \jm{^) - log Pj^{9). Thus, 6j must satisfy the likelihood equation 

J<r(^^) = 7 IT E ^;>n«9i)V}m = 0. (2) 

Under LI, the fact that ^i'jie) C as J — • oo allows us to locate a root 9t of (2) near the 
examinee's 6. Under EI, Theorem 3.1(c) ensures that 

1 J n-i 

y'A^) = 7 ^rr^my,^ - Wl ^ 0, (3) 

in probability, given 0 = ^, as long as the scoring scheme a^„» =: '^^mC^) bounded 
uniformly in j and m for each 6. Hence, we can expect to find a root dj of (2) near 9 under 
EI as well. The dependence of on 9 here is irrelevant, since we are conditioning on 
0 = ^ fixed. 

To obtain the limit (3) and similar limits needed for consistency of 9j , we assume that 
for all 9, there exists an interval B$ containing 0 and a constant < co, such that 

\>^Tmii)\ < Me >^teBe.V j, m. (4) 

Condition (4) is really a fairly mild modeling assumption. For example, in the binary 
three parameter logistic model it would be satisfied if all the difficulty and discrimination 
parameters were bounded in absolute value. 
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A second important consideration in likelihood-based (or indeed auoy) estimation is 
i dent ifi ability of the parameter. The criterion used for identifiability in Section 3, LAD, 

* 

is not necessarily appropriate when the response categories are unordered. Instead, it is 
typical and reasonable to require that for each 6, there exists an > 0 such that 

^ J n-l 

^ji^) = 7 S 5Z a;,„(^)p;^{^) > f,, V J. (5) 

ij{6) is, of course, the usual test information function. If (5) holds, there is enough 
identifiability for the MLE to work. The following proposition gives several sufficient 
criteria for identifiability in this sense* 

Proposition 4.1. If any of the following conditions hold, then (5) hold^. 

(b) V ^, 3 O 0 : i ZU IirJol^;.. W)' > V J- 

(c) V ^, 3 > 0 : ^E;., ErJo I ^'mW I > e., V J; 

(d) V ^, 3 > 0 : ^ T.^ ^l^i^) > ^e, V J. 

Proof: Condition (a) is exactly (5); condition (b) suffices by (a) and the fact that 
\IPj^{9) > 1 always holds; condition (c) suffices by (b) and the fact that 

y=l m=0 >=1 m=0 

by Jensen's inequality (Ash, 1972, p. 287). Finally, condition (d) suffices by noting that 
and^ using Jensen's inequality again, 
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The most interesting of the criteria in Proposition 4.1 is (d). Note that by taking 
ayo = 0 and a^^ h 1 for m > 0 in the definition of LAD, we see that if LAD holds and the 
RCC's are differentiable, then (5) holds also- 
Each of the conditions M, LAD, and (5) represent identifiability or detection condi- 
tions for the sequence X and latent trait 0, and they fit into a rather neat hierarchy for 
css€ntiaJly indep>endent smooth IRT models. M is the most restrictive identification con- 
dition; it imposes a highly interpretable condition on each item in the test which virtually 
guarantees LAD. LAD is less restrictive, in that it imposes the interpretation of M at the 
lest characteristic curve level, not the level of individual items. Moreover, LAD implies 
(5). The minimum information condition (5) is least interpretable, but has the advantage 
of widest applicability. Moreover, as Theorem 4.1 below shows, if (5) holds, then 6j con- 
verges to 6, given Q = 0. This hierarchy is not new or deep mathematically, but serves to 
illustrate the transition from intuitively appealing psychological models to adequate but 
less pleasing statistical ones. 

Theorem 4-1- Let A' be a polytomous item sequence satisfying EL (4) and (5), Then there 
exists a sequence {6 j : J > Je] o{ roots of (2) such that 



lim p\\ ej - e \ < €\q ^ ^] = 1, 

J—oo 



for every t > 0. 

Note that the sequence 6j may not start at J = 1, and for small J, there may be no 
solutions to (2). This is not a serious limitation; see Theorem 4.2 below. Also, when LAD 
holds, the trait being estimated is the same dominant trail whose estimation was treated 
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in Section 3; this follows from Theorem 3,3. The novelty of the following Cramer-stjle 
proof is that (local) independence is not assumed. 

Proof: Let to > 0 be arbitrary and fixed in advance. Without loss of generality, we assume 
that - cof^ + Co) C Be, where is the interval given in (4). Out goal is to obtain 
roots of (2) in the interval - Cot^ + ^o)- The second-order Taylor polynomial for ^tj{t) 
in — c^,^ + e^) is 

= J<;W + j(t - e)tj{e) + i^(t - (6) 

J n-1 



^=1 msO 

^ = 1 motO m = 0 

where ^ = ^ + r(f-^), for some 0 < r < 1. We have already shown in (3) that j£'j{e) — 0 
in probability, given 0 - 6, urder EI and (4). Similarly 

J J n-1 

7 E E - P,>n(^)] - 0. 

;= 1 m=:0 

Since E\^je"j{9)\e] = -/j(^), this implies that ^jt"j{e) + /y(^) 0 in probability, given 
0 = ^. Hence, using (4) again, we may rewrite (6) as 

y'j{t) = 0^(1) - (f - e)\ijie) - i(t - e)pjM,Y 

where /?j is a random quantity satisfying i i < 1, 0^(1) denotes quantities tending to 
0 in probability, given 0 = ^, and Jj{6) is bounded away from 0 by (5). Thus, for large 
y, -jt'j{t) is approximately linear near and with large probability is positive for some 
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€ (19- €0.^)1 negative for some € (tf,^+€o)tand by continuity equal to zero for some 
6 J € (0 - to,d + €q,) Hence, there is a sequence $j of solutions to (2) with the property 
that for any Cq > 0, 

P[\ ^j-^|<co|0 = ^]-l, 

as J — ► 00. Further details may be found in Serfiing (1980, pp. 143-148). □ 

In general, we do not expect the roots Oj of (2) to be unique (e.g., Samejima, 1973). 
Moreover, among the multiple roots of (2), there is likely to be only one consistent root 
sequence: Foutz (1977) proves that if 61 and 62, J both consistent root sequences, 
then under LI, P[^u ~ Oi.j |0 — ^]-*lasJ— ►oo. Thus, the situation, even under LI, 
is opposite that portrayed by Lord (1980, p. 59): rather than being optimistic that the 
roots should be eventually unique, one might be pessimistic that multiple roots continue 
to happen bs J 00, and only one of these roots for each J, brings us closer to the true 6. 

This is not a practical problem, however. We shall see next that the standard practice 
of approximating a root of (2) by Newton's method, produce* estimates that are consistent 
for 6 under EI, even though the MLE and the Newton's method estimate of it were com- 
puted under the assumption LI. Thus, familiar numerical methods continue to be useful 
in estimating 9 under EI. 

Theorem 4-2. Suppose the assumptions of Theorem 4.1 hold, and let 0j be any sequence 
of consistent estimates for 0, given 0 = ^, Then the Newton's method improvement, 

is also consistent for 6. 
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Proof: As in Lchmann (1983, p. 423) we may substitute a Taylor expansion of -^tji^j) 
about 6 into (7) to obtain 

The second term on the right clearly tends to zero as fi"/ ^. For the first term, a continuity 
argument shows that -^tjih) ^ /j(^) > 0, and we know from (3) that j£j{ff) O.D 

Clearly, the assertion of the theorem can be iterated to show that the result 0j of, 
say, tw^enty Newton steps from 6j would also be consistent. Such an estimator should be 
closer, in some sense, to the consistent roots found in Theorem 4.1. Newton's method 
requires an initial guess Oj; when LAD holds, Sj = A'J^iAj) is a natural choice, in view 
of Theorem 3,2. 

5. Standard Error of the MLE 

In the usual LI ability estimation theory, we expect that the sequence 6j will be 
asymptotically normal and efficient, 

j\{ej^e)^ AN{0,\/jj{9))^ (9) 

as J — ► oo, where ij{6) is the traditional test information function introd iced in (6). A 
result like (9) identifying the standard error oi 6j is needed to do statistical inference using 
6 J — or indeed, merely to know how well to trust 6j as an estimator of 9 for particular fixed 
J that arise in applications* However, (9) may fail in the essentially unidimensional case 
in two interesting ways: it may be that asymptotic normality holds but the asymptotic 
variance is no longer ij{d)^^ \ or it may be that asymptotic normality fails completely. 
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When asymptotic normality does hold, we shall see that the deviation from the efficient 
variance is controlled by the quantity 

tsl jsl 

where the item scores Aj are constructed from the scoring scheme 

ay^ = A;^(^), V j,m- (11) 

The scoring scheme (11) is a technical device that will be used throughout this section. 
The reader should not be mislead into thinking that (11) is a scoring scheme that could 
be applied to obtain a practical estimator as in Section 2 (to do so, wc would already have 
to know ^!). Under EI and the bounds (4), we know that ^Cj{d) 0, for all 6, but the 
behavior of Cj(^) itseL depends on the amount of local dependence in the item sequence 
A'. Under LI of course, Cj{0) = 0. 

To see the effect of Cj{9) on (9) under EI, we may deduce from (6) that 

in the sense that the asymptotic distributions of the left and right hand fides are the same. 
If we can identify the asymptotic distribution of (a multiple of) J~^£'j{d) then by (12), 
we will also be able to identify the asymptotic distribution and rate of convergence of §j. 
An indication of what is possible is provided by Theorem 5.1 below. Let us abbreviate 

a]ie) = Var(^j | ^) 
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= J^IIVarOl, l<?) + ~IJ2^Cov(>l,,Ay \ e), (13) 

for whatever scoring scheme {ajm } is currently under consideration. Under the scoring 
scheme (11), Aj Aj{9) H ^tj{e) and cr^C^) = + CjC^)]. 

r/icorcm J.i. Suppose that the conditions of Theorem 4,1 hold for the item sequence A' 
and the latent trait Also, suppose that for some fixed 0, the scoring scheme (11) yields 

-^{Aj ~ Ajie)] - AiV(0,l), (14) 

Then, as J — oo, 

(a) if Cj(^) — 0 as J — oc, 

(b) if Cj(^) remains bounded for all J, 

(c) if Cj{e) is unbounded and R{J) is a function of J for which R^{J)Cji$)/J remains 
bounded, 

Proof: From (12) and (14), only the asymptotic variance assertions need to be checked. 
For the scoring scheme (11), we have from (12), (13), and (14): 



R^j)ij{e)^Cj{e) 
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The assertions about the asymptotic variances in (a), (b), and (c) follow from this calcu- 
lation by choosing R{J) appropriately. Q 

Conditions (a) and (b) of the theorem correspond to the familiar case in which the rate 
of convergence of dj to 6 is J^i . U Cj{6) — ^ 0 as J oo, we get the usual asymptotically 
normal and efficient result (9) for 0j. Otherwise, we get subcfficiency or superefficiency de- 
pending on the sign of Cji6)- The use of the terms efficient, subefficient, and superefficient 
to describe the asymptotic variance as being equal to, greater than, or less than IJ^{&) is 
suggestive here but perhaps misleading. In fact, JJ^iS) is the efficient variance only when 
£ji0) is the true log-likelihood function. Under EI, some other (unknown) log-likelihood 
function L applies, and examining the true efficiency of 6j would require access to the 
(unknown) function E{-L'j{&) \&], 

Condition (c) corresponds to the rate of convergence o{ &j to 6 being slower ih^n J"^ , 
This would happen, for example, if the inter-item covariances were generally positwe and 
sufficiently large to force Cj{B) to be unbounded. Formally, there is also the possibility 
that the convergence of $j to 0 could be faster than J" 3, but this would require that 
IjiO)-^ Cj{6) 0, that is, Cj{S) negative for all large J. As we will ar£:ue next, this seems 
unlikely in many educational testing applications. Hence, this possibility was omitted from 
the theorem statement. 

For reasonably homogeneous tests, one intuitively expects that items not independent 
given 0 would be positively correlated. This is certainly implicit in the factor-analytic 
tradition of test theor5S (e.g., Anastasi, 1988, pp. 377 ff.). An example of the invocation of 
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this principle in IRT research is the design of the simulation study in Drasgow and Parsons 
(1983). Indeed, it is quite reasonable to assume that if X is essentially unidimensional with 
respect to the trait 0, there are other traits 03,©3,. * . ,0^ such that LI holds with respect 



these traits are psychologically meaningful, it is also reasonable to assume that they will 
be associated (see . Uand and Rosenbaum, 1986, for a definition), given 0 = ^. In the 
ordered-response case, a result of Jogdeo's (1978) can be used to argue that conditional on 
0 = ^ alone, the inter-item covariances will be nonnegative (indeed, any ordered scoring 
scheme for X, given ff, will be associated). Thus, Cji9) > 0 will generally be expected: 
the variance of J^iOj — 0) will generally be higher than l/Ij{9). 

Theorem 4.2 gave a practical way to approximate the estimator 9j. The following 
corollary extends Theorem 5.1 to obtain asymptotic normality for this approximation. 
Corollary 5.1. Suppose that the conditions of Theorem 5.1 hold, and that 6j is any 
estimator with R{J){6 j -6) bounded in probability. The Newton's method approximation 
of Theorem 4.2 based on 6j satisfies 



Making appropriate choices of R{J), we obtain the same three cases as in Theorem 5.1. 
Proof: Using (11) and (13), we may rewrite (8) as 



to the d-dimensional trait vector (0, 02 , ©3 , . . . , 0^) (see, for example. Stout, I98ff). If 




R{j){e*j -0) ^ R{j) 



1/2 



The result follows from (14), since R{J){6 j - 6) \s bounded in probability. □ 
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By definition, R{J){Oj - 6) is bounded in probability if P[RiJ) | - ^ | < ^j© = ^) 
cai4 V« made arbitrarily close to 1 as J — ► oo by choosing B large enough. In Section 4 
we suggested using §j = AJ^ (Aj), for any convenient scoring scheme {a^m}t as an initial 
guess for Newton's method under LAD. Routine calculation using Chebyshev's inequality 
shows that R{J){dj - 6) will then be bounded in probability as long as 

—JT^ Cov(>l.,>l; |0 = ^) is bounded, (15) 

as J oo. This represents a strengthening of EI since R{J) is a fixed, increasing function 
of J for all 6. The assumption that (15) holds for scoring scheme (11) is also implicit in 
Theorem 5.1. Hereafter, we will say that fast EI holds if (15) holds for a fixed rate R{J) 
and every bounded scoring scheme {ajfj^}. 

Theorem 5.1 also assumes that the raw score Aj is asymptotically normal in the 
sense of (20). Is this realistic? The following Central Limit Theorem (CLT) for dependent 
random variables, easily deduced from Theorem 2.2 of Dvoretzky (1972), sheds light on 
the qualitative side of this question. 

Theorem 5.2, Suppose, for some fixed 6 and some bounded scoring scheme {a^^}: 

(a) J^'c'jie) ^ cc; 

(b) 7^ Z/^, E[A, - A,{e) M,-,,^| - 0; and 

(c) 7^^^>=i^^^I^^^ \Ai^,.e] ^ 1, 
as »/ — ♦ oo. Then, for this 6 and {a^^}, 

^ [Aj - Ajid)]-- AN{0,\). (16) 
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The assumptions of Theorem 5.2 wculd be difficult to verify in practice, but this is 
somewhat offset by the fact that they are intuitively meaningful; thus, we can at least ask 
whether these assumptions are qualitatively app€»aling. Assumption (a) is merely a way of 
ensuring that most items contribute significantly to v4j. It is difficult to imagine a useful 
item sequence or scoring scheme for which this would not be true. The conditioning in 
(b) is not only on a fixed value 6 of ©, but also on a fixed value of X^^i , for each j. If 
the conditioning on Aj^^ were dropped, (b) would become an exact equality. Under EI, 
conditioning on 0 = ^ stabilizes Aj^^ with high prco^.'^bility when j is large (Theorem 3.1). 
so we might expect that assumption (b) would hold for many EI item sequences. To gain 
some intuition about assumption (c), we may rewrite it as 

• _ 1- (1' ) 



Hence, recalling that the Aj are bounded, (c) implies the fast £1 condition, 

2 ^ 

J 51 21 Co\{Ai,Ai \e) is bounded, as J ^ oo. (18) 

This condition is almost ubiquitous in general CLT's for dependent random variables (e.g., 
Bradley, 1986). Note that (18) precludes applying Theorem 5.2 in the situation of Theorem 
5.1 (c); moreover, from (17) we can see that some additional balancing between the variance 
and covariance terms is needed for assumption (c) to hold. Example 5.2 below shows that 
Theorem 5.1 (c) can nevertheless occur. 

There is another way in which the assumptions (b) and (c) are not entirely innocuous. 
EI and its strengthening (18) are second-order conditions (i.e., conditions that restrict onJy 
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the expected values (given 6) of products of two item responses at a time). It is well-known 
that second-order conditions alone are not enough to guarantee (16). An exaniple (without 
reference to latent traits) is constructed by Bradley (1989, Section 2) of a dichotomous 
sequence X for which -^Y, and Xj are independent for every pair i ^ j and yet the CLT 
fails, (The reader is referred to Bradley's paper for the rather complicated construction; 
also, see Bradley, 1985,) In light of the recent interest in Markov dependence among items 
(e.g., Jannarone, 1986: Spray &l Ackerman, 1986), it is intriguing to observe that Bradley's 
example arises as a dichotomous scoring scheme for a Markov chain. 

We conclude this section with two simpler examples illustrating the practical effects 
that item dependence can have on the standard error of $j. The examples are both vari- 
ations of the paragraph comprehension example of Stout (1990; Example 2.3). Section 
4.2 of Rosenbaum (1988) is also relevant. More complicated examples and/or examples in 
other realistic settings might also be constructed. 

Example 5.1. Suppose Xi . X-i , X^ , . . . are binary item response variables, having the same 
response curve Pj{0) = 6 (so the latent scale is the inte^^cll (0.1) and P\X j ^ 1 j ^] ^ ^). 
Moreover, suppose that the items are arranged in successive groups of Qo items as 



etc., 

such that different groups of go items are independent of one another, given 6, and items 

within a single group are positively correlated, given 6. For simplicity, we will take 

J c if Xi and Xj are in the same group. 



... ,X 
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for some fixed c € (0, 1). This is & n&ive xnoiiel for & p&r&gr&ph comprehension test in which 
several paragraphs are presented and g„ questions are asked for each paragraph. Here, 6 
represents a trait common to all the items, which we might wish to think of as reading 
comprehension; and the nonzero correlations are induced by nuisance traits, for example, 
specific knowledge about the subject matter of the paragraph at hand. 

It is straightforward to verify that EI holds for this sequence of items; that is, for 
any bounded scoring scheme {aj^ : m = 0,1; j = 1,2,...} generating item scores Aj = 
(1 - Xj)ajo + Xjaji , 

1 

Moreover, it can be verified (via Theorem 5.2 or by applying the usual CLT to the para- 
graph scores = E^tlJ,'/;! ^^^at for any bounded scoring scheme {a,^} for which 
Jajie) — oo, 

given e = 0. Now, for the scoring scheme (11), the item scores are Aj = (A'^ - 0)/Bi \ - $) 
so that Cov(^,,^^ I e) = c/e{l - 6) if Ai and Aj are in the same group, and 0 otherwise. 
Letting kj be the greatest integer less than or equal to J/g^, we «.«;€ that 



J ^ \ 2J en - 6) 6(1 - G)' 



and is bounded but nonzero as J ^ oo. Hence, using scoring scheme (11), 

h{e) aj{0) ' 
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and is asymptotically normal with mean 0 and variance 

Note that the deviation from the efficient variance ^(1 - ^) is indeed due to dependence 
among the go items related to the same paragraph, and that Cj{9) ^ cig^ - 1)/^(1 - 
appropriately characterizes this deviation* This illustrates part (b) of Theorem 5.1. □ 

The situation can be understood intuitively as follows: when items in a group are 
positively correlated given 6, a particular response to one item in the group is likely to 
be duplicated in responses to other items in the same group. Thus, a wrong response is 
likely to bias the ^-estimate downward more than is usual, and a right response is likely 
to bias the estimate upward, the biasing effect being magnified by the size of the group. 
This inflates the effect of noise inherent in the ^-estimation prob' n. 

Example 5.2. Now let the sizes of the groups of mutually dependent items increase. We 
take dichotomous items X^.X^. with identical iCC's P^iO) = ^ as before, but now group 
them as follows: 

A 1 , A 2 , • • ' 1 1) ; 
etc., 

where g{k) is a nondecreasing function of k. For specificity, we will take g{k) = . Once 
again, each group of g{k) items is independent of the other groups, and for simplicity, we 
take Corr(A^,AJ = c for A, and Xj in the same group. We can verify that EI holds for 
this sequence of items, and apply Liapunov's Central Limit Theorem (Serfling^ 1980, p. 

29 



30) to the p&ragrapb scores 
to conclude that 

^ [Aj - Ajie)]'- AN{Q,l), 



given 0 = ^, for any bounded scoring scheme {a^m}' (Here, Theorem 5.2 does not apply 
at all, since we shall see below that Cj{0) — ^ oo.) 
As in the previous example 



kssi 

where kj is chosen so that 

kj 



fcsl Arsl 
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that is, kj a; (^)i ji for g{k) = ki . Thus, Cj(e) grows like 

^tL^gjk)' 5 5 i , 

(Incidentally, this also helps establish El, since it shows that Cj{6)/J — 0 as J oc.) 
Hence, 

which is asymptotically normal with mean zero and variance Cji6)/Ij (6)^ js: 
(0.871 - 6). Although the asymptotic variance appears lower than the efficient vari- 
ance ^(1 — 6), the rate of convergence of ffj to 6 is only J~» , rather slower than the usual 
ate J-i.U 
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In this example, the groups of dependent items become so large that the magnified 
effects of individual item responses have actually slowed the rate of convergence of 6j to 
$. These magnified efi'ects would be present in any ^-estimation method that ignored the 
nature of the inter-item dependencies. However, this need not be an argument against 
using estimation methods that assume local independence when this docs not hold. The 
real lesson is that if one wants to continue to use a familiar estimator like $j even though 
LI may fail, then one must be able to qualitatively justify an asymptotic distributiou 
assumption like (14), and to quantitatively estimate Cj{d) so that realistic standard errors 
of estimation can be calculated, etc. Note that in Example 5.2, Cj{6) is unbounded as 
J oo, but EI still holds; the unboundedness of Cj{6) is responsible for the^ower rate 
of convergence of 9j to 9. If Cj {6) grows too fast as J — ► oo then EI itself can also fail. 

The quantity Cj{6), or perhaps its average value over all ^'s, should be viewed as an 
index of departure from local independence, locating collections of items — tests — along a 
continuum of unidimensional behavior from strictly locally independent unidimensional, 
di = 1, situations to dramatically non-unidimensionah ds > 1, situations. This suggests 
the following model fit/trait estimation taxonomy, based upon the index Cj{6) (contingent, 
of course, upon the qualitative acceptance of (14)): 

I. Cj{6) ^ 0 for all realistic In this situation, ability estimation based on a 

di = I model could proceed as usual, using familiar standsurd errors such as ij{9)^'^^^ . 
This situation covers both di I settings as well as those essentially undimensional, 
dE = 1, settings that only mildly violate LI. 
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IL Cj{0) ^ 0 but moderate in size for all realistic d's. Here, ©-estimation procedures 
based on LI could still be used but the conventional standard errors would have to be 
replaced by + Cjie)y^^/lj{e). This would be the usual d£ = 1 setting, 

IIL Cji0) ^ 0 of substantial size for many 6^s, This would suggest that there is 
so much residual variability in the data after conditioning on 0, that some genuinely 
multidimensional latent trait model may be needed. 

Of course, the practical use of such a taxonomy rests on effective estimation of Cj{6) 
itself. Work recently completed by Nandakumar and Stout (1989) aims at developing a 
practical index of EI for binary items, related to Cj{6) but not adapted to the task of trait 
estimation. In particular, they investigate empirically the extent to which d£ = I holds 
or fails in the paragraph comprehension setting, as the number of items per paragraph 
increases. 

Another approach to estimating Cj{6) is suggested by the work of Gibbons, Bock, 
and Hedeker (1989). With the help of a computational device called the modified Clark 
algorithm, they are able to factor- analyze binary items, assumed to have normal ogive 
response curves, with correlated specific factors. Cj{6) can then be estimated from the 
common factor loadings and specific factor correlations- at least when their one-factor 
solution leads to the same latent trait as identified in the definition of = I. 
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6. Application to the Dichotomous Case 

In the binary (dichotomous) case, in which Xj takes the value 0 or 1 depending on 
the examinee's answer to the j^^ item, the tix, = 1 likelihood is 

J 

p[xj = £^ 1 0 = - n Pji^y' (1 - p^m'"'' ^ (19) 

with monotone item characteristic cur/es (ICC's) PjiO) = f^l^j = 1 |0 = Let us 
assume only that EI and LAD hold with respect to ©, The definitions and theorems 
presented in Section 3 all specialize to the dichotomous setting, and in fact, most were 
introduced in this setting by Stout (1990). The MLE must solve the likelihood equation, 

J 

0 = tji9j) = Yl Ki^jH^j - PA^J)1 (20) 

where — log Pj{6)/{1 - Pj{&)) (the use of the log-odds-ratios is equivalent to using 

the log-category- probabilities A^o ^iid from Section 4, and avoids summation over the 
n = 2 response categories). As before, boundedness of A^(^) together with EI guarantees 
that -j^j(^) converges to zero, given 0 = ^. More precisely, we will assume that, for all 6, 
there exists an interval B0 containing 6 and a constant < 00, such that 

l^;'(OI <M,\/i€Be. V;. (21) 
To complete a proof of consistency, we again need to bound the test information function 



as 



hiO) = 7 E ^'ji»)Pji&) > er. > 0, (22) 
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as J — ^ C50, and as in Proposition 4.1, LAD is a sufficient but not necessary condition 
to achieve this. Note that the information function in (22) is precisely the same one 
introduced in (5) for n = 2 response categories. 

Theorem 6.1. Let X be a dichotomous item sequence satisfying EI, (21), and (22), Then 
there exists a sequence {0j : J > Jo} of roots of (20) such that 

lim - ^ i < €|0 = ^] = 1, 

for every t > 0. 

Theorem 6.2. Suppose the assumptions of Theorem 6.1 hold, and let 6j be any sequence 
of consistent estimates of ^, given Q ^ 6. Then, the Newton's method improvement, 

is also consistent for 6. 

An obvious candidate for the initial guess in Theorem 6.2 is &j — PJ^{Xj). From 
(20) and the above results, we see again that the consistency and asymptotic distribution 
of 6j is tied up with the behavior of the centered weighted averages 

= -jz2^A^:~Pjm, (23) 

with Oj = A^(^), where again the dependence of aj on 6 does not matter since 6 is fixed. 

Once again, let 

a'jie) = Var(ij j^) 

j=i .=1 j=i 
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and let Cj(^) = (2/J)E/., T,%\ Kie)X'jie)Coy(Xi,Xj\e). 

Theorem 6.S. Suppose that the assumptions of Theorem 6.1 hold for the item sequence X 
and the latent trait 0. Also suppose, given 0 = tf, that in (23), 

—^[Aj - Ajm - ANiO,l). (24) 

Finally, suppose RiJ) is a function for which R'^ {J)C j {6) / J remains bounded. Then, 

n^j^^sj -e)^AN [o. ; ' l^ey ) ' 

Moreover, if 6j is any estimator for which R{J){6 j — 0) is bounded in probability, dj from 
Theorem 6.2 is also asymptotically normal with the same asymptotic variance. 

Once (24) is deemed qualitatively acceptable, the asymptotic behavior of 6j is deter- 
mined by Cj{0)^ When Cj{S) is near zero, we can expect the items to behave as though 
LI were true; when Cj{6) is much larger, we should expect item behavior that can be 
effectively analyzed only with a multidimensional model. 



7. Discussion 

In assessing the shortcomings of the traditional local independence approach to item 
response modeling, Drasgow and Parsons (1983, p. 198) conclude, **it seems clear that 
researchers should be more concerned with the robustness of estimation techniques to 
minor violations of dimensionality assumptions than with the possibly neverending task 
of measuring all latent variables that underlie responses in a particular content domain." 
This call for the study of structural robustness in IRT is compelling: Although violations of 

35 



strictly unidimensional latent structure can sometimes be explicitly modeled and exploited, 
many situations call for a unidimensional approach that is tolerant of minor violations of 

♦ 

strict unidimensionality. 

In this paper we have extended Stout's modeling notion of essential independence, EI, 
for binary items (Stout, 1987; 1990) to polytomous item sequences. Essential independence 
permits some dependence among items such as would be caused by minor violations of 
local independence, LI, due to nuisance trait multidimensionality, but still allows a single 
dominant latent trait to be identified. This type of mild interitem local dependence is 
arguably more reahstic than unidimensional local independence models, cix^ = 1, for many 
currently-used ability and achievement tests. 

For items in which the expected amount of credit awarded increases with the latent 
trait, we have developed a theory of ability estimation under EI that closely parallels Stout 
(1990). As in Stout's dichotomous response theory, monotonicity need not be aj^sumed 
for the individual items, but rather only for the test characteristic curve. Under this 
aggregate monotonicity condition, called hcai asymptotic discrimination, LAD, we have 
shown that the transforn^ation Aj^ (Aj) of the raw test score is a consistent estimator of 
each examinee's 0 as the test length J grows. A definition of essential unidimensionaliiy, 
dE = 1» was proposed based on EI and LAD holding with respect to a unidimensional trait 
0. 

An alternative to scoring the items using an ad hoc scoring scheme {a^m} (which leads 
to the test scores Aj above) is to ignore the local dependence among the items and employ 
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a well-known Ll-based estimation procedure* Since it is common to use an LI model even 
when LI is believed to be only approximately true, the behavior of such a procedure in the 
more realistic EI setting is an important issue, as Drasgow and Parsons attest to above. 
Maximum likelihood estimation of 6 was examined in this light. 

The MLE 9j based on a unidimensional LI mcxiel was shown to be consistent for each 
examinee's 6 as the test length J grows, when only EI and not LI holds. In this sense 
is robust as an estimator of 6 under this realistic structural violation of LI. When an 
estimator such as Oj is found to be consistent, its precision as an estimator is usually 
judged by the theoretical asymptotic distribution of J^^^{6j Under LI, we expect this 
to be asymptotically normal with mean zero and variance l/I as J — ♦ oo, where Iji6) 
is the test information function. When 0j is based on an LI model but only EI holds, 
this asymptotic distribution may fail in various ways: the rate J^^^ may be preserved 
but the variance may be inflated by an essentially constant amount; the rate J^^^ may 
fail; and finally, it is conceivable that asymptotic normality itself fails, with any rate cf 
convergence. Hence, the robustness of consistency for the MLE does not extend to a 
robustness of asymptotic distribution, under EI violations of LL 

Conditions for asymptotic normality of $j involve higher product-moment assump- 
tions that do not admit easy rigorous checks &om the data. Hence, asymptotic normality 
itself is usually a qualitative issue that must be decided by the practitioner in each ap- 
plication. If as3''mptotic normality is qualitatively acceptable, the correct variance can be 
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calculated with the help of the expression 

where the Aj 's score each response category according to the derivative of the log-category- 
probability: Aj = H^Jo^'jmi^)^}^- Princip:^, Cjie) could be positive or negative; 
however, in many educational testing settings, we expect it to be positive. Under EI, 
Cj{9)/J — 0, but Cjiff) itself need not tend to zero. 

For fixed J, the quantity Cj{6) should be viewed as an index of local item dependence 
along a continuum that connects strictly di = I unidimensional models with strictly 
dE > I multidimensional models. Such a continuum has also been suggested by Drasgow 
and Parsons (1983). The dE ~ I unidimensional models, which are the focus of this paper, 
form the middle of this continuum. The nearer Cj{8) to zero, the more we can expect 
latent trait estimation to behave as though LI were true. The larger Cj(^), the more 
we should expect item behavior that can be effectively analyzed only with an explicitly 
multidimensional latent trait model. Thus, if Cj{6) could be effectively estimated in 
practice, we would be able to use it to predict the benavior of 6j. Various ideas for doing 
this arc provided by Wainer and Wright (1980), Gibbons, Bock, and Hedeker (1989), and 
Nandakumar and Stout (1989). For this reason, the non-robustness of distribution of 6j 
need not be defeating. 

The principal assumptions needed to establish consistency of the Ll-based MLE were 
EI and that the information function Jj(.0) (calculated as though LI were true) be bounded 
away from 0 and oo. Indeed, a hierarchy of identifiability conditions for estimating 0 can be 
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developed, starting with cumulative RCC monotonicity M (i.e., ordered-response items), 
moving througn test characteristic curve monotonicity LAD, to the bounding o{Ij{d) away 
from 0. Each of these conditions in some sense implies the next, and ail allow various forms 
of unidimensional latent trait estimation. This hierarchy illustrates the transition from 
highly interprctable but very restrictive conditions, such as M, to less restrictive conditions 
that do not admit easy psychometric interpretation, such as the bounding conditions on 

Essential independence plays a central role in the convergence of dj to 6 because it 
guarantees the stability of certain weighted averages of item sc^P'.s that appear in the 
Ll-bascd log-likelihood. Therefore, we might expect that under EI and suitable reguilarity 
conditions, other estimators that depend on the stability of the Ll-based log-likelihood 
would also be consistent estimators of &. Indexed, a trivial modification of the proof of 
Theorem 4.1 shows that the posterior mode, whidi maximizes the posterior density 

P\Xj - xj\ 

is consistent for B under the conditions of that theorem and a mild nondegeneracy con- 
dition on the density /(^) of 0 in the examinee population. The posterior mode has 
been considered by Samejima (1969) and by Lord (1986), for example. A different set of 
regularity conditions from those employed in Theorem 4.1, which aae equally plausible in 
applications, can be used to obtain consistency of the posterior mean, 

E\Q ixj) = y* efji$\Xj)de. 
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Essential independence is used here to ignore the part of the integral away from the value 
^0 that generated the data Xj; sec, for example, the proof of equation (5) in Walker (1969). 
The regularity conditions needed generalize Walker's conditions, and incidentally provide 
another proof of consistency of the MLE. The posterior mean has been considered by, for 
example, Bock and Mislevy (1982) as well as earlier \jy S^jnejima (1969). 

Essential independence is thus seen to be a minimal condition under which strictly 
di = 1 trait estimation procedures may be expected to work when applied to mildly mul- 
tidimensional data. Our examination of essential independence in the polytomous item 
response setting shows that this condition is not an artifact of the simple structure of 
dichotomously-scored tests, but a general condition that can be fruitfully applied to stan- 
dardized tests of sorts. Moreover, we have shown that a rigorous approach to the struc- 
tural robustness analysis advocated by Drasgow and Parsons (1983) is possible. Locally 
independent latent trait models can, and should, continue to be used to develop estimation 
and decision procedures in IRT, if for no other reason than their analytic simplicity. How- 
ever, before Ll-based procedures are applied on-line, they should be thoroughly examined 
under the more realistic assumption of essential independence. 
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